home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Network Support Library
/
RoseWare - Network Support Library.iso
/
3rdparty
/
nostop.new
< prev
next >
Wrap
Text File
|
1993-10-22
|
20KB
|
409 lines
(C) 1989,90,91,92,93 NONSTOP NETWORKS LIMITED.
***PREPARE FOR A CRASH!***
It is inevitable that your Primary Server will someday crash.
No*STOP NETWORK, of course, will keep you running on the Secondary.
It does not, however, automatically provide Secondary Server login
capability. You must do this yourself by creating login scripts in
advance on the Secondary Server for use when your Primary Server is
down. These login scripts should, in most cases, be the same as the
login scripts that were on your Primary Server before you installed
No*STOP NETWORK. Only the server name will be different. This ensures
that users who were not on the system at the time of the crash, or
those who were on the system but subsequently logged out, will be
able to login to the Secondary Server and use it while the Primary
Server is being repaired. [REMEMBER TO RUN RECOVERY (SEC->PRIM)
AFTER THE PRIMARY HAS BEEN REPAIRED AND BEFORE MIRRORING IS
RESTARTED]
************USER CONTROL***************************
OK, so you took the advice above so your users can access the Secondary
Server while the Primary Server is down for repairs. So what's to keep
them from logging on to the Secondary Server when the Primary Server is
up? That could be a very bad situation. What is needed is a way to keep
users from logging in to the Secondary unless the Primary is off-line.
The example below is for users of Netware 3.x. The GOTO command is not
supported for Netware 2.x. The lines in the example should be inserted
at the beginning of the Secondary Server system login script. It keeps
all users except the SUPERVISOR from logging in to the Secondary Server
if the Primary Server is up. In this example, the SUPERVISOR is allowed
in under the presumption that supervisors need to get in to do non-
mirrored maintenence tasks. To lock the SUPERVISOR out as well, delete the
first line.
IF LOGIN_NAME == "SUPERVISOR" THEN GOTO OK2
ATTACH (primary_server_name)
IF ERROR_LEVEL != "0" THEN GOTO OK
FIRE PHASERS 5 TIMES
WRITE "YOU CANNOT LOGIN TO (secondary_server_name)!!!"
[At this point you should delete all mappings established
by the system login script]
EXIT
OK:
WRITE "OK, SINCE (primary_server_name) IS DOWN, YOU MAY USE (secondary_name)"
OK2:
[The normal login script begins here]
We would appreciate any ideas for doing this easier or better.
***Handles Tip ***
As explained in the Manual, if you use 20 Handles without
us, you will use another one for every file that is open
concurrently and mirrored when we are running. Also, you
are advised to increase the CONFIG.SYS parameter to include
the number of originals plus the number of mirrors. If, then,
this number is exceeded, NOSTOP will abort the process.
To avoid being aborted and let the application process the
error, set the CONFIG.SYS parameter to some number less than
the total number needed. If this is done, DOS will run out of
handles before NOSTOP does, thus the normal DOS error
will be passed to the application. When doing this, you may get
our "MIRROR MISMATCH" error message. In this case, either reduce
or increment the "FILES=" parameter by one and retry.
Related to the above instruction: If you are getting "MIRROR
MISMATCH" errors when you know there is no actual mismatch, the
problem probably is caused by the "FILES=" parameter in CONFIG.SYS
being set too low.
***RECOVERY OF APPLICATION PROGRAM FILES***
There are three methods for installing applications in preparation
for mirroring:
#1 Turn on No*STOP NETWORK and mirror the installation
#2 Install on each server separately
#3 Install on one server and copy application
directories to the other server. This is best done
using the RECOVERY utility included on the No*STOP
NETWORK diskette.
In almost all cases, any method will work. In the case where
an application required separate installation on each server
(method #2), it is likely that server-unique data and/or
structures are being created during installation. In this case,
straightforward use of the recovery utilities will not be
adequate when recovering from a server failure. In the worst case,
the application will have to be re-installed on the recovered
server. It may be possible in some cases, however, to prepare
in advance for recovery by performing the following steps:
1. Install on the Primary Server to subdirectory \[app].
2. Install on the Secondary Server to subdirectory \[app].
3. Copy from Primary:[app] to Secondary:[appB].
4. Copy from Secondary:[app] to Primary:[appB].
5. When a server fails, run the recovery utilities.
6. Copy from "GOOD" Server:[appB] to "BAD" Server:[app].
7. And, to prepare for the next failure,
8. Copy from "GOOD" Server:[app] to "BAD" Server:[appB].
If this works, it will make re-installation unnecessary.
***INCOMPATIBILITIES***
Until further notice, be advised that we are incompatible with the
following software:
LOTUS MAGELLAN - Total incompatibility.
WINDOWS SMARTDRIVE - When using the WINDOWS version of
SMARTDRIVE (3-10-92), you may experience problems
when a drive goes down, which can interfere with the
continuous processing functions of No*STOP NETWORK.
There is no problem with DOS 5 or 6 SMARTDRIVE.
MS-DOS 6.0 - When running WINDOWS (3.1 or WFW) under DOS 6.0,
your workstation may hang if you exit WINDOWS directly
after a server has failed. DOS 6 passed all of the other
tests in our validation suite with flying colors. Stay tuned.
Update: We have found that the Novell patches in DOSUP7 and
WINUP7 make the situation worse - the workstation hangs
immediately after downing the server.
Update: One of our validation tests hangs, EVEN UNMIRRORED,
when DBLSPACE is active. The test is a COBOL program running
on LANTASTIC 5.0. The error message is:
"PSLINEHF segment RT: Error 198 @ COBOL PC086D"
The test runs perfectly, mirrored or unmirrored, when
DBLSPACE is not active.
********************************************************************
HERE ARE SOME ADDITIONS TO THE NEXT VERSION OF THE MANUAL WHICH YOU
MAY FIND USEFUL.
********************************************************************
3.5 Avoid Split Network Danger
Used incorrectly, server mirroring can introduce a danger not
present in an un-mirrored environment - data can be corrupted
when a split network is created due to the failure or partial
inaccessibility of a server. This section will describe how a split
network occurs and will define the actions necessary to avoid
data corruption.
3.5.1 Local Area Network Cabling Topology
In a linear, "daisy-chain", network the potential exists for what is
called a "split network". A split network is created when servers
which are not topologically next to each other are cut off from
each other by the failure of a link in the network, such that some
workstations can access one of the servers and other workstations can
access the other server, one or more workstations being unable to access
both servers. When mirroring servers, a split network condition can have
the result that some workstations are updating the data base on one
server, while other workstations are updating the data base on the other
server. This causes the versions of the data base on the two servers to
diverge. If the link is re-established and the servers are re-synchronized
by using No*STOP RECOVERY (or any other similar method), the updates
performed on one of the servers will be lost. For this reason, attention
should be given to the logical topology of your network when mirroring
servers.
There are two simple rules to follow for daisy chain networks:
o ALWAYS LOCATE YOUR SERVERS NEXT TO EACH
OTHER TOPOLOGICALLY,
o ALWAYS LOCATE YOUR SERVERS AT THE END OF
THE CHAIN (either end will do).
[sorry, figures unavailable]
Figure 1 illustrates a good topology:
A. The link between servers 1 and 2 is broken.
Both workstations continue to acces server2.
B. The link between workstations 1 and 2 is broken:
Workstation1 continues to access both servers.
Workstation2 can access neither server.
C. The link between the servers and the workstations is
broken:
Neither workstation can access either server.
In all cases, the integrity of the data base is uncompromised.
Figure 2 illustrates a bad topology:
D. The link between server1 and workstation1 is broken:
Both workstations continue to access server2.
E. The link between workstation2 and server2 is broken:
Both workstations continue to access server1.
F. The link between workstations 1 and 2 is broken:
Workstation1 continues to access server1.
Workstation2 continues to access server2.
In cases D and E, the integrity of the data base is maintained.
In case F, the two servers will develop different versions of the
data base. If the two are not synchronized data corruption is
likely to occur. If they are synchronized, updates to one of the
servers will be lost.
Figure 3 illustrates another bad topology:
G. The link between workstation1 and server1 is broken:
Workstation1 can access neither server.
Workstation2 continues to access both servers.
H. The link between server2 and workstation2 is broken:
Workstation1 continues to access both servers.
Workstation2 can access neither server.
I. The link between servers 1 and 2 is broken:
Workstation1 continues to access server1.
Workstation2 continues to access server2.
In cases G and H, the integrity of the data base is maintained.
In case I, the two servers will develop different versions of the
data base. If the two are not synchronized data corruption is
likely to occur. If they are synchronized, updates to one of the
servers will be lost.
Ring network topologies, such as IBM Token Ring, do not
exhibit this design vulnerability.
3.5.2 Wide Area Network Connections
If your network has servers that are separated geographically
such that a communications link is required, it is, almost by
definition, a latent split network. If the link is cut, the users at
the separate sites will be updating their respective locally
resident servers without reference to the distant servers. The
word "almost" is used because it is still possible to put both
servers at the same end of the network, that is, at the same site.
In this case, if the communications link is broken, the users at
the non-servered site will lose access to both servers. Some
enterprises can live with this, others can not. Also, in this
configuration, any possible performance benefits of cross-
mirroring due to nearness will be lost to the remote site.
Performance benefits of cross mirroring due to load leveling will
still accrue.
3.5.3 Non-Dedicated Servers
In some installations one (or both) of the servers is enlisted for
double duty - as a server for the network and as a workstation.
In this case, the network is split by definition. If the cable
attached to the server/workstation fails or is knocked loose, the
workstation partition will lose the other server but can still
update its resident server. The other workstations will lose the
isolated server but can still update the other server. This
configuration can be thought of as introducing an additional
termination to the topology, with a workstation at the end.
3.5.4 Dealing With The Problem
Some subset the of universe of installations will be, for varying
reasons, unable to avoid the occurence of a split network.
Most, if not all of these reasons will be physical. Most of the
members of this subset, however, will not be unduly
discommoded by a split network, or at least will find the benefits
of mirroring to outweigh the inconveniences brought about by a
network which has been split due to partial server inaccessibility.
3.5.4.1 Defining The Problem
The danger inherent in a split network is, as described above,
that separate groups of clients can be updating separate
servers, creating two versions of the data base.
3.5.4.1.1 Lost Updates
The updates made to one of the servers will be lost during Recovery.
3.5.4.1.2 Data Base Corruption
It is possible, depending on the nature of activity against the
data base, that it will be corrupted.
3.5.4.1.3 Incomplete Data
Incomplete data is defined as data that is correct in itself, but
which is not as rich in information as it could be. Users on
different sides of the split may be working without the benefit of
updates from the other side. The severity of this situation
depends on the nature of the enterprise and the length of time
the split persists. In any case, by definition for this discourse,
it is considered better to be working with incomplete data than
not to be working at all. For example, in a mail advertising
campaign, it is better to send advertising copy to prospects
recently removed from the data base, or to fail to send to
prospects recently added, than to send no mail at all.
3.5.4.1.4 Erroneous Data
Erroneus data is defined as data that can lead to inappropriate
and damaging action. The damage can be to the data base or
directly to the operations of the enterprise it supports. Thus,
nearly simultaneous withdrawals from an account at different
branch offices of a bank which together, but not separately,
exceed the balance of an account can expose the bank to fraud
if the branches are on different sides of a split network.
Fortunately, there are several ways to completely avoid these
dangers. Some of these methods, however, will cause
inconvenience.
3.5.4.2 Removing The Danger
If you find it impossible to avoid a latent split network, it is
prudent to take measures which will guarantee that no danger
to your data base or operations exists. The key concept in
avoiding danger to the data base is that when the network
becomes split, updates to common data must be restricted to
only one of the servers.
3.5.4.2.1 Partition by Data
In the best of worlds, you will be able to partition your users
such that those on one side of the potential split are performing
activities against data completely unrelated to those on the other
side. Suppose, for instance, that the latent split situation is
forcrd on you by the geographic separation of offices, one of
which performs CAD/CAM operations related to engineering
design, and one of which performs accounting operations in
support of day to day operation of the business. In this case, if
communications between the two offices, while creating a split
network, will not effect operations. Indeed, this situation
presents opportunities for performance enhancement through
the use of cross mirroring. Take note, however, that in this situation,
special Recovery procedures must be followed after re-establishing the
link between the two splits. In short, two Recovery jobs must be
run, one for each of the partitions, and each in a different
direction.
3.5.4.2.2 Partition by Access
If your users can be partitioned such that those on one side of
the split are only reading the data base, the situation is almost
as good as if they were partitioned by data. Thus, if the link is
broken, they can continue operations with no danger to the data
base, albeit with slightly out of date data. The mail advertising
operation can again be used as an example. One site may
have the responsibility of updating the data base of addresses
while the other site is reading the data base to support their
mailings.
3.5.4.2.3 Partition by Expendability
If you are not fortunate enough to be able to partition by data or
access, you will have to bite the bullet and forcibly restrict
access to the data base by one of the sides of the split.
Operational measures must be taken to ensure that only one
side is updating. The easiest fool-proof way to do this is as
follows:
o Choose which side is expendable.
o Invoke No*STOP NETWORK for the expendable side
with message redirection, changing the default
response to a drive failure from "DROP AND
CONTINUE" to "ABORT".
o After they have aborted, you can, depending on your
operations, re-institute them as READ-ONLY users,
put them into transaction logging mode for delayed
updates), put them to other tasks which do not
reference the data base, or leave them idle.
3.6 Performance Enhancement Notes
Some thought should be given to performance issues when
designing your network for mirroring. Although the overhead
imposed by No*STOP NETWORK is small, care should be taken
during the design phase to minimize it. In some cases
No*STOP NETWORK, judiciously employed, can actually
improve the performance of your network. The following
guidelines all spring from the fact that No*STOP NETWORK
does not mirror READs.
3.6.1 Read From The Faster Server
If your servers are of unequal speeds, make the faster server
your Primary when declaring drive pairs. In this way you will be
retrieving data from the most efficient resource.
3.6.2 Bring Your Data Closer
There is an opportunity in certain types of installations to
significantly improve retrieval times for all or some of the users.
If, for instance their work consists of extensive searches of a
data base, perhaps correlating data into information via repeated
complex queries, this work could be speeded up significantly by
doing those queries on local equipment. If the users have
sufficient local resources, they can download the data base and
supporting files such as indexes to their local hard drive and
declare it (or a subdirectory of it) the Primary. In this way, all
READs are performed locally without reference to the network.
This can speed up response at the workstation while at the
same time reducing traffic on the network. An even more
dramatic improvement is possible if a workstation RAM drive can
be declared the Primary.
This approach is, of course, not for all enterprises. Of largest
concern is the fact that it creates a latent split network by
mimicking a non-dedicated server at each workstation using it.
It should also be noted that the users addressing their
local resources will not have access to updates performed since
their most recent download. If this is not of paramount concern,
or if the data being accessed locally is private data, such as an
intelligence analyst's "shoebox" files, this approacch can, with
careful planning provide large benefits. Bear in mind that the
flexibility of No*STOP NETWORK will allow you to define drive
pairs such that one or more pairs can support the local activity
(e.g., C: to F:), while other pairs can support access to a
common data base (e.g., L; to M:), possibly to send updates
based on investigation of private files. If the users performing
local processing wish to have their private files maintained on
two file servers for maximum safety, No*STOP NETWORK-MM
can be used to support more than one Secondary server (e.g.,
C: to F: to G:).
3.6.3 Cross Mirroring to Level the Load
If you have purchased a second file server in order to gain the
benefits of Level 3 Fault Tolerance, you may be pleasantly
surprised to discover that yours may be the type of enterprise
which can derive performance benefits from this extra server.
If your users can be partitioned according to the data they use,
you can cross mirror so that one group of users are doing their
READ accesses on one server, while the other half is doing
them on the other server, thus providing a form of parallel
access to the network for retrieval activity. For example, if you
have a user group performing CAD/CAM activities in support of
engineering design, and another group of users performing
corporate accounting activities, they are more than likely
accessing completely disparate data, having no data in
common. In this case there is no danger of data corruption, and
whereas before one server was doing all the READing, now
another server is enlisted to share the load. To accomplish
cross mirroring in the example cited, simply turn
No*STOP NETWORK on for the engineers, mirroring from, for
example, F: to G:, and, for the accountants, from G: to F:.